Generalized Reinforcement Learning for Manipulation Skills – Combining Low-dimensional Bayesian Optimization with High-dimensional Motion Optimization

نویسندگان

Peter Englert

Marc Toussaint

چکیده

This paper addresses the problem of how a robot can autonomously improve a manipulation skill in an efficient and secure manner. Instead of using the standard reinforcement learning formulation where all objectives are defined in a single reward function, we propose a generalized formulation that consists of three components: 1) A known analytic cost function; 2) A black-box reward function; 3) A black-box binary success constraint. While optimization of the analytic cost function is inherently high-dimensional, in typical robot manipulation problems we may assume that the black-box reward and constraint only depend on a lower dimensional projection of the policy. With our formulation we can exploit this structure and propose a sample-efficient learning framework that iteratively improves the skill with respect to the objective functions under the condition that the success constraint is fulfilled. The analytic cost function is optimized with motion optimization methods over the high dimensional policy where the lower dimensional parameters are fixed. The black-box reward is optimized with constraint Bayesian optimization over the lowerdimensional parameter. During both improvement steps the success constraint is used to keep the optimization in a secure region and to clearly distinguish between motions that lead to success or failure. The learning algorithm is evaluated on simulated benchmark problems and real-world tasks like opening a door with a PR2.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Combined Optimization and Reinforcement Learning for Manipulation Skills

—This work addresses the problem of how a robot can improve a manipulation skill in a sample-efficient and secure manner. As an alternative to the standard reinforcement learning formulation where all objectives are defined in a single reward function, we propose a generalized formulation that consists of three components: 1) A known analytic control cost function; 2) A black-box return functio...

متن کامل

Combining Trajectory Optimization, Supervised Machine Learning, and Model Structure for Mitigating the Curse of Dimensionality in the Control of Bipedal Robots

To overcome the obstructions imposed by high-dimensional bipedal models, we embed a stable walking motion in an attractive low-dimensional surface of the system’s state space. The process begins with trajectory optimization to design an open-loop periodic walking motion of the high-dimensional model and then adding to this solution, a carefully selected set of additional open-loop trajectories ...

متن کامل

Bayesian Optimization for Contextual Policy Search*

Contextual policy search allows adapting robotic movement primitives to different situations. For instance, a locomotion primitive might be adapted to different terrain inclinations or desired walking speeds. Such an adaptation is often achievable by modifying a relatively small number of hyperparameters; however, learning when performed on an actual robotic system is typically restricted to a ...

متن کامل

Learning Dynamic Manipulation Skills under Unknown Dynamics with Guided Policy Search

Planning and trajectory optimization can readily be used for kinematic control of robotic manipulation. However, planning dynamic motor skills requires a detailed physical simulation, and some aspects of the task, such as contacts, are very difficult to simulate with enough accuracy for dynamic manipulation. Alternatively, manipulation skills can be learned from experience, allowing them to def...

متن کامل

Injection Optimization for Heavy Duty Diesel Engine in Order to Find High Efficiency and Low NOx Engine Concept by Means of Quasi Dimensional Multi-Zone Spray Modeling

The purpose of this study is to investigate the effect of injection parameters on a heavy duty diesel engine performance and emission characteristics. In order to analyze the injection and spray characteristics of diesel fuel with employing high-pressure common-rail injection system, the injection characteristics such as injection delay, injection duration, injection rate, number of nozzle hole...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Generalized Reinforcement Learning for Manipulation Skills – Combining Low-dimensional Bayesian Optimization with High-dimensional Motion Optimization

نویسندگان

چکیده

منابع مشابه

Combined Optimization and Reinforcement Learning for Manipulation Skills

Combining Trajectory Optimization, Supervised Machine Learning, and Model Structure for Mitigating the Curse of Dimensionality in the Control of Bipedal Robots

Bayesian Optimization for Contextual Policy Search*

Learning Dynamic Manipulation Skills under Unknown Dynamics with Guided Policy Search

Injection Optimization for Heavy Duty Diesel Engine in Order to Find High Efficiency and Low NOx Engine Concept by Means of Quasi Dimensional Multi-Zone Spray Modeling

عنوان ژورنال:

اشتراک گذاری